11 research outputs found

    What Algorithms can Transformers Learn? A Study in Length Generalization

    Full text link
    Large language models exhibit surprising emergent generalization properties, yet also struggle on many simple reasoning tasks such as arithmetic and parity. This raises the question of if and when Transformer models can learn the true algorithm for solving a task. We study the scope of Transformers' abilities in the specific setting of length generalization on algorithmic tasks. Here, we propose a unifying framework to understand when and how Transformers can exhibit strong length generalization on a given task. Specifically, we leverage RASP (Weiss et al., 2021) -- a programming language designed for the computational model of a Transformer -- and introduce the RASP-Generalization Conjecture: Transformers tend to length generalize on a task if the task can be solved by a short RASP program which works for all input lengths. This simple conjecture remarkably captures most known instances of length generalization on algorithmic tasks. Moreover, we leverage our insights to drastically improve generalization performance on traditionally hard tasks (such as parity and addition). On the theoretical side, we give a simple example where the "min-degree-interpolator" model of learning from Abbe et al. (2023) does not correctly predict Transformers' out-of-distribution behavior, but our conjecture does. Overall, our work provides a novel perspective on the mechanisms of compositional generalization and the algorithmic capabilities of Transformers.Comment: Preprin

    Vanishing Gradients in Reinforcement Finetuning of Language Models

    Full text link
    Pretrained language models are commonly aligned with human preferences and downstream tasks via reinforcement finetuning (RFT), which entails maximizing a (possibly learned) reward function using policy gradient algorithms. This work highlights a fundamental optimization obstacle in RFT: we prove that the expected gradient for an input vanishes when its reward standard deviation under the model is small, even if the expected reward is far from optimal. Through experiments on an RFT benchmark and controlled environments, as well as a theoretical analysis, we then demonstrate that vanishing gradients due to small reward standard deviation are prevalent and detrimental, leading to extremely slow reward maximization. Lastly, we explore ways to overcome vanishing gradients in RFT. We find the common practice of an initial supervised finetuning (SFT) phase to be the most promising candidate, which sheds light on its importance in an RFT pipeline. Moreover, we show that a relatively small number of SFT optimization steps on as few as 1% of the input samples can suffice, indicating that the initial SFT phase need not be expensive in terms of compute and data labeling efforts. Overall, our results emphasize that being mindful for inputs whose expected gradient vanishes, as measured by the reward standard deviation, is crucial for successful execution of RFT

    Employing Virtualization in Library Computing: Use Cases and Lessons Learned

    No full text
    This paper provides a broad overview of virtualization technology and describes several examples of its use at the University of California, San Diego Libraries. Libraries can leverage virtualization to address many long-standing library computing challenges, but careful planning is needed to determine if this technology is the right solution for a specific need. This paper outlines both technical and usability considerations, and concludes with a discussion of potential enterprise impacts on the library infrastructure

    Employing Virtualization in Library Computing: Use Cases and Lessons Learned

    No full text
    This paper provides a broad overview of virtualization technology and describes several examples of its use at the University of California, San Diego Libraries. Libraries can leverage virtualization to address many long-standing library computing challenges, but careful planning is needed to determine if this technology is the right solution for a specific need. This paper outlines both technical and usability considerations, and concludes with a discussion of potential enterprise impacts on the library infrastructure

    Expression, purification, crystallization and preliminary X-ray diffraction of a novel Nitrosomonas europaea cytochrome, cytochrome P460

    No full text
    Cytochrome P460 from N. europaea, a novel mono-heme protein containing an unusual lysine cross-link to the porphyrin ring, has been recombinantly expressed and purified from E. coli and crystallized. The crystals belong to the trigonal space group P31/221, with unit-cell parameters a = b = 53.3, c = 127.1 Å, one monomer in the asymmetric unit and diffract to 1.7 Å on a Cu Kα rotating-anode X-ray source

    Molecular and Structural Analysis of the Helicobacter pylori cag Type IV Secretion System Core Complex

    No full text
    Bacterial type IV secretion systems (T4SSs) can function to export or import DNA, and can deliver effector proteins into a wide range of target cells. Relatively little is known about the structural organization of T4SSs that secrete effector proteins. In this report, we describe the isolation and analysis of a membrane-spanning core complex from the Helicobacter pylori cag T4SS, which has an important role in the pathogenesis of gastric cancer. We show that this complex contains five H. pylori proteins, CagM, CagT, Cag3, CagX, and CagY, each of which is required for cag T4SS activity. CagX and CagY are orthologous to the VirB9 and VirB10 components of T4SSs in other bacterial species, and the other three Cag proteins are unique to H. pylori. Negative stain single-particle electron microscopy revealed complexes 41 nm in diameter, characterized by a 19-nm-diameter central ring linked to an outer ring by spoke-like linkers. Incomplete complexes formed by Δcag3 or ΔcagT mutants retain the 19-nm-diameter ring but lack an organized outer ring. Immunogold labeling studies confirm that Cag3 is a peripheral component of the complex. The cag T4SS core complex has an overall diameter and structural organization that differ considerably from the corresponding features of conjugative T4SSs. These results highlight specialized features of the H. pylori cag T4SS that are optimized for function in the human gastric mucosal environment

    The Future of Cancer and Collective Intelligence in the Post-Covid World

    Get PDF
    The Future of Cancer and Collective Intelligence in the Post-Covid World project was jointly conceived by the Innovation School at Glasgow School of Art and the Institute of Cancer Sciences at the University of Glasgow. Graduating year Product Design students from the Innovation School were presented with a challenge-based project to produce a vision of the future based on current trends that relate to the Future of Cancer and Collective Intelligence in the Post-Covid World. Currently, cancer research and development occur in isolated pockets within stages across the cancer care continuum, which often negatively impacts on the potential for cancer professionals to exchange, integrate and share data, insights and knowledge across the framework. One of the most significant societal shifts currently taking place within Cancer and Collective Intelligence is the transformation from the siloed clinic point of care model to a seamless continuum of care with greater focus on prevention and early intervention, changing what it means to be someone living with cancer and a professional working within this context. From this new dynamic, emerges the concept of living-labs; transdisciplinary communities of practice involving people working within and living with cancer, capable, through collective intelligence-enabled systems and services, of generating knowledge which can be used locally, and shared globally, to deliver focused innovations across the whole cancer ecosystem. If collective intelligence holds the potential to truly connect people to people, and people to data, across diverse communities, linking peoples’ lived experiences locally and globally, what kinds of new health and care services might emerge to improve cancer control across the continuum from prevention, detection, treatment and survivorship, and what types of new roles might emerge for citizens, patients and community groups to collaboratively drive these forward with health professionals? In order to address this challenge, the GSA Innovation School’s final year Product Design students and faculty formed a dynamic community of practice with cancer practitioners and researchers from the Institute of Cancer Sciences at The University of Glasgow and beyond to envisage a 2030 cancer blueprint as a series of future world exhibits, and create the designed products, services and experiences for the people who might live and work within this ecosystem. This project involved the students working in partnership with an Expert Faculty composed of Cancer Physicians, Cancer Researchers, Social Scientists, Biomedical Engineers, Health Research Specialists, Past Patients, Digital Health Specialists, Design Experts and Government Agencies. The Expert Faculty was assembled from a range of local to global organisations including the University of Glasgow, the Beatson West of Scotland Cancer Centre, the Malawi Ministry of Health and the International Agency for Research on Cancer (IARC is part of the World Health Organization). This project asked the students to embark on a speculative design exploration into future experiences of working and living with cancer ten years from now, where advances in collective intelligence have evolved to the extent that new forms and ecosystems of medical practice, cancer care and experiences of living with, through and beyond cancer transform how we interact with each other, with health professionals and the communities around us. This project was conceived and carried out during the global COVID-19 pandemic. Throughout the project the students positively used this situation to creatively embrace a digital studio practice and innovate around digital and remote access platforms and forums for collaboration, development and engagement. Thus, the designed products, services and experiences for the people who might live and work within the cancer ecosystem are presented as innovative, highly creative, fully immersive, experiential exhibits. The project was divided into two sections: The first was a collaborative stage based on Future Worlds. The worlds are groups of students working together on specific topics, to establish the context for their project and collaborate on research and development. These were clustered together around ‘Future Working’ and ‘Future Living’ but also joined up across these groups to create pairs of worlds, and in the process generate collective intelligence between the groups. The worlds clustered around ‘Future Working’ are Education, Care and Treatment, Prevention and Detection. Future Worlds clustered around ‘Future Living’ are Personal Wellbeing, Communicating Cancer, Beyond Cancer. The second stage saw students explore their individual response to their assigned Future World that had been created in the first stage. Each student developed their own research by iteratively creating a design outcome that was appropriate to the Future cancer World. This culminated in each student producing designed products, services or systems and a communication of the future experiences created. Throughout the project, the results were presented as a series live interactive digitally curated, virtual work-in-progress exhibitions for specific audiences including a special global event to participate in World Cancer Day on the 4th February 2021. An event which allowed the students to actively interact and discuss the project with a global audience of cancer community leaders. The deposited materials are arranged as follows: 1. Readme files - two readme files relate to tage one and stage two of the project as outlined above. 2. Project overview document - gives a visual overview of the structure and timeline of the project. 3. Stage one data folders - the data folders for stage one of the project are named by the six Future Worlds through which each group explored possible futures. 4. Stage two data folders - the data folders for stage two of the project are named for the individual students who conducted the work and organised by the Future World cluster they worked within
    corecore